Punctuating confusion networks for speech translation

نویسندگان

  • Roldano Cattoni
  • Nicola Bertoldi
  • Marcello Federico
چکیده

Translating from confusion networks (CNs) has been proven to be more effective than translating from single best hypotheses. Moreover, it is widely accepted that the availability of good punctuation marks in the input can improve translation quality. At present, no ASR systems can generate punctuation marks in the word graphs, therefore CNs miss punctuation. In this paper we investigate the problem of adding punctuation marks into confusion networks. We investigate different punctuation strategies and show that the use of multiple hypotheses improves translation quality in a large-vocabulary speech translation task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Edinburgh SLT and MT System Description for the IWSLT 2013 Evaluation

This paper gives a description of the University of Edinburgh’s (UEDIN) systems for IWSLT 2013. We participated in all the MT tracks and the German-to-English and Englishto-French SLT tracks. Our SLT submissions experimented with including ASR uncertainty into the decoding process via confusion networks, and looked at different ways of punctuating ASR output. Our MT submissions are mainly based...

متن کامل

Phonetic Representation-Based Speech Translation

This paper explores a tight coupling of Automatic Speech Recognition (ASR) and Machine Translation (MT) for speech translation with information sharing on the phonelevel. Our novel approach allows MT to access fine-grained phonetic information from ASR, as a methodology for facilitating speech translation. Specifically, Phrase-based Statistical MT (PBSMT) models are adapted to work on source la...

متن کامل

Recent Advances in Spoken Language Translation

The talk is structured in three parts. The first part overviews problems and approaches to spoken language translation. The second part presents challenges and achievements of the European Project TC-STAR, that ended in 2007. The third part describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, I will discuss...

متن کامل

The IRST English-Spanish translation system for european parliament speeches

This paper presents the spoken language translation system developed at FBK-irst during the TC-STAR project. The system integrates automatic speech recognition with machine translation through the use of confusion networks, which permit to represent a huge number of transcription hypotheses generated by the speech recognizer. Confusion networks are efficiently decoded by a statistical machine t...

متن کامل

Fast speech decoding through phone confusion networks

We present a two stage automatic speech recognition architecture suited for applications, such as spoken document retrieval, where large scale language models can be used and very low out-of-vocabulary rates need to be reached. The proposed system couples a weakly constrained phone-recognizer with a phone-to-word decoder that was originally developed for phrase-based statistical machine transla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007